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ABSTRACT 


Circulating  tumor  cells  (CTC)  mediate  metastatic  spread  of  many  solid  tumors.  Our  proposal 
exploited  a  new  technology,  MagSweeper,  for  isolation  of  CTCs  from  patients  with  prostate 
cancer.  MagSweeper  is  an  automated  immunomagnetic  separation  technology  that  gently 
extracts  CTCs  from  whole  blood,  allowing  the  possibility  of  characterization  of  the  cells 
molecularly.  We  have  now  successfully  demonstrated  that  the  MagSweeper  isolates  completely 
purified  CTCs  while  preserving  the  integrity  and  viability  of  these  fragile  cells  and  eliminating 
contamination  from  nonspecific  adsorption  or  entrapment  of  blood  cells.  We  have  optimized 
the  isolation  of  spiked  LNCaP  cells  from  the  blood  of  healthy  donors  and  then  from  the  isolated 
cells  performed  whole  transcriptome  mRNA-Seq.  MagSweeper  can  effectively  isolate  these  cells 
with  a  high  capture  efficiency  and  purity.  In  addition,  the  MagSweeper  isolation  process  does 
not  significantly  perturb  the  transcriptional  signature  of  isolated  single  cells.  We  also  have  been 
able  to  isolate  CTCs  from  patients  with  metastatic  prostate  cancer,  and  the  numbers  of  cells 
isolated  were  comparable  to  CellSearch,  a  commercial  FDA  approved  CTC  counting  technology. 
Although  the  quality  of  RNA  from  the  CTCs  has  showed  signs  of  degradation,  we  have  been  able 
to  generate  whole  transcriptome  profiles  from  single  cells  from  patients  using  mRNA-Seq. 
Expressed  transcripts  showed  the  cells  were  of  prostatic  origin  and  reflected  known  aspects  of 
prostate  cancer  biology.  These  results  demonstrate  that  the  MagSweeper  provides  access  to 
intact  CTCs  and  that  it  is  possible  to  extract  biologically  and  clinically  relevant  information  from 
molecular  analysis  of  single  CTCs. 

Introduction: 

Our  proposal  exploits  a  new  technology,  MagSweeper,  for  isolation  of  Circulating  Tumor  Cells 
from  patients  with  prostate  cancer.  MagSweeper  is  an  automated  immunomagnetic  separation 
technology  that  gently  extracts  CTCs  from  whole  blood,  allowing  the  possibility  of 
characterization  of  the  cells  molecularly.  The  MagSweeper  isolates  completely  purified  CTCs 
while  preserving  the  integrity  and  viability  of  these  fragile  cells  and  eliminating  contamination 
from  nonspecific  adsorption  or  entrapment  of  other  blood  cells.  We  proposed  3  specific  aims  in 
this  project.  We  are  excited  to  submit  the  project  closing  report  in  which  we  have  made 
significant  progress  on  all  proposed  aims.  Our  research  has  also  resulted  in  a  high  quality 
research  publication  which  is  attached  at  the  end  of  the  report. 


Specific  Aim  1:  To  optimize  MagSweeper  technology  for  purification  of  prostate  CTCs: 

Our  first  objective  was  to  utilize  our  optimized  MagSweeper  protocol  for  isolation  of  circulating 
tumor  cells  from  prostate  cancer  patients.  However,  in  a  continued  effort  to  improve  the 
performance  of  MagSweeper,  we  continued  cell  spiking  experiments  (n=54  experiments  and  11 
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blood  samples).  In  these  experiments,  we  spiked  100  LNCaP  cells  into  3.75  ml  of  whole  blood 
and  isolated  the  cells  using  MagSweeper.  The  experimental  conditions  were:  prior  to  spiking, 
LNCaP  cell  viability  was  measured  by  Trypan  blue  exclusion  which  was  found  to  be  89%±8%.  In 
addition,  LNCaP  cells  were  fluorescently  labeled  with  antibodies  against  EpCAM  and  CD45  to 
distinguish  LNCaPs  from  WBCs.  DAPI  stain  was  used  to  identify  dead  (membrane  compromised) 
cells.  Post-isolation  we  recovered  a  mean  of  81%±16%  of  live  spiked  LNCaP  cells 
(EpCAM+/CD45-/DAPI-)  (Figure  IB)  while  DAPI  staining  revealed  an  additional  12%±6%  of  the 
isolated  LNCaP  cells  that  were  membrane  compromised  as  occurs  in  dead  cells  (EpCAM+/CD45- 
/DAPI+).  Since  the  original  cell  spike  contained  11%  dead  cells  (measured  by  Trypan  blue 
exclusion)  recovery  of  12%  DAPI  positive  cells  following  MagSweeping  indicates  that  damage  to 
spiked  LNCaP  cells  during  the  entire  procedure  is  minimal.  Normal,  unspiked  blood  samples 
failed  to  yield  EpCAM  positive  cells  (data  not  shown). 

KEY  RESEARCH  ACCOMPLISHMENTS: 

•  MagSweeper  and  the  optimized  protocol  can  consistently  isolate  EpCAM  expressing 
LNCaP  with  maximum  efficiency  and  purity  and  with  minimal  cell  death. 

Specific  Aim  2:  To  test  whether  MagSweeper  can  successfully  isolate  CTCs  from  patients  with 
localized  and  metastatic  prostate  cancer 

We  have  tested  this  same  CTC  isolation  protocol  on  isolating  circulating  tumor  cells  from 
metastatic  prostate  cancer  patients.  So  far  we  have  recruited  82  patients  with  metastatic 
castrate  resistant  prostate  cancer  (CRPC)  and  found  live  CTCs  in  23  of  the  patients.  Live  CTCs 
isolated  from  patient  blood  samples  were  examined  for  PSA  and  AR  expression  to  discern  the 
origin  of  the  CTCs.  We  tested  6  CTCs  and  found  all  expressed  PSA  while  5/6  were  positive  for  AR 
expression  demonstrating  that  these  CTCs  are  prostate  derived.  We  have  recently  attempted 
to  isolate  CTCs  from  patients  with  localized  prostate  cancer.  Thus  far,  we  have  obtained  no 
CTCs  from  4  patients  prior  to  or  immediately  after  surgery  for  localized  prostate  cancer. 
Currently  we  are  analyzing  the  clinical  data  from  our  patients  with  CRPC  evaluate  whether  cell 
number  correlates  with  clinic  features  of  their  disease  and  the  effects  of  therapy  on  CTC 
number.  We  anticipate  preparation  and  submission  of  a  manuscript  in  this  calendar  year. 

KEY  RESEARCH  ACCOMPLISHMENTS: 

•  We  were  able  to  isolate  live  circulating  tumor  cells  from  prostate  cancer  patients. 

•  We  were  able  to  establish  the  purity  and  specificity  of  CTC  using  prostate  specific 
marker  assay. 


3 


Specific  Aim  3:  Perform  gene  expression  biomarker  assays  on  individual  CTCs  isolated  from 
patients. 

In  our  initial  application  we  proposed  doing  a  multiplex  qPCR  analysis  of  CTC  using  prostate 
specific  marker  genes.  This  includes  finding  ways  to  use  Fluidigm  technology  which  can 
accommodate  up  to  384  genes  for  comparison  of  gene  expression  profiles.  Since  our  proposal 
started,  there  has  been  tremendous  progress  in  DNA  sequencing  both  in  terms  of  technology 
and  cost.  We  have  begun  using  NexGen  single  cell  sequencing  methodology  developed  at 
lllumina  Inc.  (Hayward,  CA)  to  interrogate  gene  expression  (single  cell  RNA-Seq). 

MagSweeper  isolation  has  minimal  effects  on  single  cell  transcriptomes 

To  demonstrate  that  the  MagSweeper  isolation  process  does  not  affect  the  global 
transcriptional  profile  of  isolated  cells,  we  performed  single  cell  mRNA-Seq  and  detailed 
bioinformatics  analysis  on  2  groups  of  LNCaP  cells.  First  set  contained  4  fresh  LNCAP  cells  picked 
just  prior  to  spiking  into  blood,  and  the  second  set  contained  4  LNCaPs  after  MagSweeper 
isolation  from  spiked  normal  blood.  Isolated  cells  were  stored  frozen  for  at  least  a  month  to 
simulate  storage  conditions.  BioAnalyzer  traces  of  amplified  cDNA  from  a  fresh  and  a 
MagSweeper  isolated  cell  revealed  similar  molecular  weight  peaks  of  amplification  products 
centered  at  approximately  1000  base  pairs  (bp)  (Figure  2A). 

To  study  genes  differentially  expressed  between  fresh  and  MagSweeper  isolated  single  cells,  we 
first  used  the  Bioconductor  edgeR  package.  We  identified  only  1  gene  as  differentially 
expressed  between  MagSweeper-isolated  and  control  LNCaP  cells  at  a  false  discovery  rate 
(FDR)  cutoff  of  0.05  and  none  at  a  cutoff  of  0.01.  We  also  explored  several  other  methods  for 
identifying  differences  between  the  fresh  and  MagSweeper  isolated  cells.  First,  to  characterize 
the  degree  of  cell-to-cell  variability,  we  asked  how  many  genes  had  a  very  high  fragments  per 
kilobase  of  exon  per  million  fragments  mapped  (FPKM)  (>10)  in  at  least  1  sample  and  a  very  low 
FPKM  (<0.1)  in  at  least  one  sample.  We  considered  three  different  groups  of  samples:  (1)  4 
fresh  cells  (2)  4  MagSwept  cells  and  (3)  2  fresh  and  2  MagSwept  cells.  The  number  of  highly 
variable  genes  in  four  fresh  cells  (215),  four  MagSwept  cells  (268)  and  a  combination  of  2  fresh 
and  2  MagSwept  cells  (239)  was  similar  in  all  groups  and  likely  reflects  cell  to  cell  heterogeneity. 

Next  we  looked  for  genes  that  were  relatively  consistent  across  all  the  samples  of  a  given  type. 
We  reasoned  differential  expression  between  the  two  sets  of  samples  might  be  more  readily 
detectable  in  this  set  of  genes.  Using  this  set  of  genes,  we  calculated  the  correlation  of  the 
number  of  reads  per  gene  between  each  pair  of  samples,  and  found  that  the  within-group 
correlations  were  no  stronger  than  the  between  group  correlations.  In  our  last  comparison  we 
calculated  the  correlation  of  the  number  of  reads  per  gene  between  each  pair  of  samples,  and 
found  that  the  within-group  correlations  were  no  stronger  than  the  between  group  correlations 


-that  is,  the  cells  did  not  cluster  based  on  isolation  method  (Figure  2B).  Finally  in  a  separate 
study,  we  ran  an  lllumina  expression  microarray  on  pools  of  10,000  LNCaP  cells  pre-  and  post- 
MagSweeper  isolation.  Again,  there  was  high  cross-correlation  (R2  =  0.985)  between 
MagSweeper  isolated  cells  and  controls  indicating  that  MagSweeper  produces  minimal 
alterations  in  gene  expression  (data  not  shown). 

mRIMA  Seq  of  single  prostate  cancer  CTCs 

We  prepared  amplified  cDNAfrom  67  CTCs  isolated  from  13  prostate  cancer  patients.  CTCs 
were  isolated  after  MagSweeping  based  on  immunofluorescent  staining  for  cells  that  were 
(EpCAM+/CD45-/DAPI  -).  Unlike  cultured  LNCaP  cells,  we  found  that  there  was  a  wide  range  of 
sizes  of  amplified  cDNAs  in  patient-derived  CTCs,  reflective  of  initial  RNA  quality.  We 
characterized  traces  of  amplification  products  into  3  groups:  1)  those  with  peaks  centered 
around  1000  bps  (good  quality),  2)  traces  with  intermediate  length  amplification  products 
(partially  degraded),  and  3)  traces  with  predominantly  low  molecular  weight  amplification 
products  (degraded)  (Figure  2C).  Looking  across  all  67  CTCs,  21%  had  good  quality  RNA,  37% 
were  partially  degraded,  and  42%  of  the  samples  were  degraded  (Figure  2D).  RNA  quality 
tended  to  be  somewhat  patient  specific  -  for  example  patient  2  yielded  the  highest  number 
(8/12)  of  good  quality  RNA  samples.  Using  RNA  quality  as  a  guide,  we  sequenced  the  libraries  of 
24  CTCs  which  included  representatives  of  all  three  RNA  quality  classifications,  and  aligned 
sequences  to  the  human  genome  (build  hgl9)  (Figure  2D,  %  alignment).  Based  on  sequence 
alignment  of  greater  than  five  percent,  sequence  data  for  20  CTCs  collected  from  4  patients  (PI, 
P2,  P3  and  P4)  were  used  for  in  depth  analysis  (Figure  2E). 

mRNA-Seq  data  quality 

To  assess  the  mRNA-Seq  data  quality,  coverage  was  calculated  across  a  set  of  600  hand-picked 
genes  for  quality  control  check  of  mRNA-seq  data.  The  sequencing  data  from  the  single  LNCaP, 
PC-3,  and  T24  cells  passed  the  quality  control  standards  for  >60%  alignment  and  <65%  median 
alignment  CV  typically  applied  to  large  RNA  input  mRNA-Seq  data  sets  (Figure  3A).  In  contrast, 
CTCs  displayed  higher  coverage  median  CVs  and  lower  percentage  of  alignments  than  cultured 
cells.  Typical  coverage  plots  for  cell  lines,  normal  prostate  tissues,  and  CTCs  are  shown  in  Figure 
3B.  The  cell  lines  displayed  smooth  coverage  across  the  length  of  the  transcript,  while  the 
normal  prostate  samples  had  a  slight  3'  bias,  typical  for  tissue  samples  (Figure  3B).  The  CTCs 
had  a  wide  range  of  coverage  bias,  as  shown.  Since  oligo-dT  was  used  to  prime  cDNA  synthesis, 
a  3-prime  bias  in  the  coverage  data  suggests  mRNA  degradation. 

mRNA-Seq  data  content 

We  looked  at  expression  levels  of  several  prostate  makers  and  a  leukocyte  marker  to  establish 
that  the  patient-derived  cells  were  of  prostatic  origin:  androgen  receptor  (AR),  prostate-specific 
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antigen  (PSA,  KLK3),  TMPRSS2,  and  the  leukocyte  marker  CD45  in  all  CTC  samples.  Figure  4, 
shows  the  FPKM  of  each  of  these  markers  in  patient  CTCs,  tissue  culture  cell  lines  and  normal 
prostate,  with  values  of  >  1  FPKM  shaded  in  green.  All  but  one  of  the  CTCs  was  positive  for  at 
least  one  of  the  prostate  markers.  LNCaP  and  normal  prostate  showed  expected  expression  of 
these  prostate  markers.  PC-3  lacked  KLK3  and  AR  expressed,  as  expected  (Paris  et  al.,  2009)  but 
did  express  TMPRSS2.  Importantly,  all  the  CTCs  and  the  cell  lines  were  negative  for  the  WBC 
marker  CD45.  Normal  prostate,  which  is  comprised  of  many  cell  types  including  WBCs,  was 
positive  for  CD45  expression. 

CTC  Pathway  Analysis 

To  find  genes  and  pathways  that  were  differentially  activated  in  CTCs  we  compared  transcript 
profiles  of  CTCs  to  those  from  normal  prostate  tissue  and  focused  on  genes  that  were  over 
expressed  in  the  CTCs.  We  reasoned  that  the  varying  degrees  of  RNA  degradation  in  the  CTCs 
would  lead  to  unreliable  counting  of  under  expressed  genes  in  the  CTCs,  especially  since  the 
half-life  of  RNA  varies  from  gene  to  gene.  We  used  a  manual  thresholding  method  to  identify 
genes  that  were  over  expressed  in  CTCs.  Specifically,  we  selected  genes  that  were  at  least  100- 
fold  higher  in  at  least  2  of  the  CTCs  compared  to  normal  prostate  tissue  and  that  contained  at 
least  10  mapped  reads  in  at  least  one  sample.  By  selecting  transcripts  that  are  highly 
differentially  expressed,  we  also  compensate  for  the  fact  that  a  fraction  of  the  prostate  tissue 
samples  are  epithelial  in  origin,  at  the  expense  of  a  high  false  negative  rate. 

We  identified  181  genes  over-expressed  in  CTCs  compared  to  normal  prostate  tissue.  The  list 
contains  many  interesting  transcripts  germane  to  prostate  cancer  biology.  For  instance,  all  3 
CTCs  from  patient  1  expressed  high  levels  of  SPINK1,  a  transcript  and  protein  identified  as 
elevated  in  TMPRSS2-ERG  fusion  negative  prostate  cancers  and  associated  with  aggressive 
prostate  cancer  (Punnoose  et  al.,  2010).  In  addition,  there  were  a  significant  number  of  cell 
cycle  and  mitosis  associated  transcripts  in  the  highly  expressed  gene  set  including  CCNA2, 
CCNAB1  and  B2,  CDC20,  CDCA3,  CDCA3,  CDCA8,  CDL1,  CDKN3,  CENPE,  CENPI,  CENPK,  CD28  and 
ASPM.  Notably  4  transcripts  (TOP2A,  TK1,  TPX2  and  KIAA0101)  expressed  at  high  levels  in  CTCs 
were  found  in  a  list  of  31  transcripts  associated  with  disease  recurrence  after  radical 
prostatectomy  we  published  recently  (Racila  et  al.,  1998).  CTCs  from  patients  1  and  2  expressed 
high  levels  of  BIRC5  (Survivin),  an  anti-apoptotic  gene  expressed  at  high  levels  in  castration- 
resistant  prostate  cancer.  Finally,  CTCs  expressed  high  levels  of  cancer  associated  transcripts 
(BAGE,  BAGE3,  CT45A1,  CT45A4,  CT45A5,  CT45A6,  CTAG18,  CTAG2,  MAGEA12,  MAGEA1, 
MAGEA3,  MAGEA6,  MAGEC1,  MAGEC2,  and  PTTG1)  and  transcripts  important  in  regulating 
development  (HOXB7,  HOXB8,  HOXB9,  NANOGNB,  and  LOC404266). 

Using  Ingenuity  Pathway  Analysis  (IPA),  we  looked  for  overrepresented  pathways  and  gene 
sets.  The  top  Diseases  and  Disorders  was  Cancer  and  the  top  Canonical  Pathway  was  "Cell 


6 


Cycle:  G2/M  DNA  Damage  Checkpoint  Regulation.  Looking  specifically  at  genes  overlapping  the 
IPA  function  Prostate  Cancer,  9  genes  were  identified:  AR,  TK1,  PLK1,  MAGEA1,  MAGEC1, 
MAGEC2,  CTAGB1,  BIRC5,  and  T0P2A.  Notably,  the  CTCs  from  patient  2  contributed  most 
significantly  to  these  results.  Repeating  these  analyses  with  only  genes  identified  in  P2  verses 
normal  prostate  tissue  produced  very  similar  results  with  highly  significant  p-values. 
Interestingly,  excluding  P2  from  the  analysis  yields  a  weaker  but  still  significant  association  with 
prostate  cancer.  This  is  consistent  with  the  observation  that  the  CTCs  from  P2  yielded 
significantly  better  quality  libraries. 

KEY  RESEARCH  ACCOMPLISHMENTS: 

•  We  were  able  to  test  single  cell  whole  trancriptome  amplification  and  sequencing 
protocol  on  20  CTCs  isolated  from  4  prostate  cancer  patients. 

•  Using  this  protocol  we  could  reproducibly  amplify  66%  of  mRNA  pool  from  a  single  cell. 

•  The  gene  expression  /  sequencing  profile  obtained  from  the  single  CTC  does 
demonstrate  its  origin  from  prostate. 

•  IPA  analysis  does  suggest  the  link  between  the  gene  expression  profile  of  CTC  and 
prostate  cancer  development. 

REPORTABLE  OUTCOMES: 

Our  manuscript  entitled  "mRNA-Seq  of  Single  Prostate  Cancer  Circulating  Tumor  Cells  Reveals 
Recapitulation  of  Gene  Expression  and  Pathways  Found  in  Prostate  Cancer"  was  published  in 
PLoS-One  in  Nov  2012.  This  manuscript  is  highly  accessed  at  the  PLoS-ONE  website,  has  been 
circulated  in  social  media,  and  has  already  been  cited  12  times  per  Google  Scholar. 

Overall: 

We  are  very  excited  to  report  significant  progress  made  on  all  3  specific  aims.  We  have 
optimized  protocols  for  prostate  specific  CTC  isolation  and  single  cell  whole  transcriptome 
sequencing  which  has  now  been  tested  in  cells  isolated  from  number  of  metastatic  prostate 
cancer  patients.  We  believe  that  generating  a  dataset  of  CTC  transcriptomes  will  provide 
significant  insights  into  CTC  biology  and  could  have  clinical  utility.  Our  efforts  are  continuing  in 
collecting  CTCs  from  prostate  cancer  patients  and  banking  these  CTCs  for  characterization.  We 
are  looking  into  better  and  improved  NexGen  gene  enrichment  characterization  methods. 

Future  Aims: 

We  have  expanded  our  CTC  research  efforts  in  improving  the  sensitivity  of  both  EpCAM+  and 
EpCAM-  CTCs.  There  is  a  general  agreement  within  the  CTC  community  of  presence  of  diversity 
in  CTC  population  possibly  undergoing  transient  epithelial  to  mesenchymal  transition  (EMT) 
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and/or  its  converse  -  MET.  Cells  that  undergo  EMT  can  lose  expression  of  epithelial  surface 
markers  such  as  EpCAM,  making  isolation  impossible.  Intriguing  data  in  breast  cancer  suggests 
that  EMT  and  MET  transitions  can  be  operative  in  acquisition  of  resistance  to  chemotherapy 
and  biological  therapies  (Yu.  et  al.  2013).  If  that  is  the  case,  measurement  of  all  types  of  CTCs 
could  be  important  in  assessing  response  to  therapy  and  could  serve  as  a  guide  to  therapy.  So  a 
CTC  capture  strategy  effective  for  all  cell  populations  could  provide  a  better  measure  for 
estimating  the  risk  of  the  disease,  possibility  of  recurrence  and  response  to  therapy.  To  that 
end,  we  are  developing  an  antibody  cocktail  that  could  capture  both  epithelial  and 
mesenchymal  populations  of  cells.  We  plan  to  apply  this  capture  method  to  early  and  late  stage 
prostate  cancer  patients.  In  addition,  preliminary  data  suggests  our  antibody  cocktail  might  be 
broadly  applied  in  other  cancer  types. 

Moving  forward,  we  plan  to  apply  for  additional  funding  to  expand  our  studies  of  CTCs  and  are 
formulating  several  aims.  Very  likely  these  will  include: 

•  To  test  whether  our  new  antibody  cocktail  provides  higher  yields  and  sensitivity 
in  isolating  prostate  cancer  CTCs  across  a  broader  range  of  patients. 

•  To  develop  downstream  analytics  to  characterize  genomic  alterations  and 
compare  those  alterations  to  tumor  biopsy  tissues 

•  To  develop  analytic  capacities  that  allow  objective  scoring  of  CTCs  for  EMT  and 
MET 

•  To  correlate  CTC  number,  EMT  and  genomic  features  with  response  to  therapy  in 
CRPC. 
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Figure  1.  CTC  isolation  by  the  MagSweeper.  (A)  Image  of  hood  prototype  of  the  MagSweeper. 
(B)  Percent  capture  of  100  LNCaP  cells  spiked  into  3.75  ml  of  normal  blood  (N=ll  experiments, 
54  samples  and  11  donors).  Blue  circles  show  mean  percent  recoveries  of  live  EpCAM  (+)/CD45 
(-)/DAPI(-)  cells  and  red  circles  show  mean  recoveries  of  membrane  compromised  EpCAM 
(+)/CD45  (-)/DAPI(+)  cells.  Error  bars  represent  +/- 1  S.D.  (C)  Percentage  capture  and  purity  of 
10  LNCaP  cells  isolated  from  7.5  ml  of  normal  blood.  Blue  bars  are  the  overall  mean  percent 
recovery  of  cells  after  MagSweeper  and  single  cell  isolation  while  purple  bars  show  purity  of 
LNCaP  cells  after  MagSweeper  and  single  cell  isolation  (N=4  experiments,  6  samples,  4  donors). 
Error  bars  represent  +/- 1  S.D.  (D)  Bright  field  and  fluorescent  images  of  CTCs  isolated  from  a 
prostate  cancer  patient  blood  sample  and  contaminating  WBC.  Scale  bar  =  20  microns.  (E) 
MagSweeper  versus  CellSearch  comparison.  Samples  with  0  CTC  were  assigned  a  value  of  1. 
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Figure  2.  MagSweeper  isolation  has  minimal  effects  on  single  cell  transcriptomes.  (A) 
Bioanalyzer  traces  of  a  single  fresh  LNCaP  cell  prior  to  MagSweeping  (4  unlabeled  LNCaP),  and  a 
single  LNCaP  post  MagSweeper  isolation  (5  magSwept  LNCaP).  12.5  pg  of  LNCaP  total  RNA 
served  as  a  positive  control  and  cell  isolation  buffer  was  used  as  a  negative  control  (B)  Heatmap 
of  correlations  between  fresh  and  MagSwept  single-cell  RNASeq  data  and  table  of  correlations 
between  fresh  and  MagSwept  samples.  (C)  Representative  bioanalyzer  traces  of  traces  of  good, 
intermediate,  and  poor  CTC  cDNA  amplification  products.  (D)  Grading  of  CTC  RNA  quality  based 
on  cDNA  amplification  products  -  green  indicates  high  quality,  tan  samples  are  partially 
degraded  RNA  and  red  indicates  degraded  RNA  samples.  %  alignment  is  %  alignment  of 
mRNAseq  reads  to  the  human  genome.  Patient  CTC#  indicated  single  patient  CTCs  analysed  in 
this  study. 
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Figure  3.  Alignment  metrics  (A)  Percentage  of  PF  reads  that  aligned,  percentage  of  alignments 
that  map  to  coding  regions,  median  coverage  CV,  and  percentage  of  reads  that  map  to  the  five 
genes  with  the  highest  number  of  mapped  reads.  To  calculate  the  "Median  CV",  first  the  CV  of 
coverage  was  calculated  across  each  of  600  genes  in  a  hand-picked  set  of  housekeeping  genes, 
and  then  the  median  of  these  CVs  is  taken  (considering  only  the  genes  with  at  least  IX 
coverage).  The  "%  aligned"  is  the  percentage  of  PF  reads  that  align  to  the  genome  or  to  a  splice 
site,  excluding  mitochondria  and  ribosomal  RNA.  Based  on  historical  data,  expected  values  for 
median  CV  and  %  aligned  are  <65%  and  >60%,  respectively.  Coding  alignments  are  the  %  of 
reads  that  map  to  exons.  The  %  of  reads  aligned  to  the  top  5  genes  for  each  sample  are  shown 
(B)  Examples  of  the  average  length-normalized  coverage  across  the  600  housekeeping  genes, 
from  samples  LNCaP. 3,  N.l,  P2.1,  P3.4.  Position  0  is  the  5  prime  end  of  the  transcripts  and  100 
is  the  extreme  3  prime  end  of  the  transcript.  The  3  prime  biases  displayed  in  the  curves  for  N1 
and  CTC  P3.4  suggests  varying  degrees  of  RNA  degradation. 


11 


AR  KLK3  TMPRSS2  CD45 


LNCaP.l 

61.49 

3142.72 

217.78 

0.00 

LNCaPs  (prostate  cancer) 

LNCaP.2 

43.65 

3080.62 

76.18 

0.01 

LNCaP.3 

64.54 

3565.35 

313.96 

0.00 

LNCaP.4 

63.07 

556.68 

56.99 

0.00 

PC3s  (prostate  cancer) 

PC  3.1 

0.04 

0.04 

7.94 

0.05 

PC3.2 

0.08 

0.00 

6.4: 

0.00 

Pl.l 

55.62 

5198.41 

160.39 

0.00 

CTCP1 

PI. 3 

2.09 

1059.38 

0.00 

0.00 

PI. 4 

0.02 

0.49 

270.54 

0.00 

P2.1 

632.46 

803.95 

89.54 

0.00 

P2.2 

861.24 

753.98 

188.74 

0.00 

P2.3 

362.56 

0.00 

0.00 

0.00 

P2.4 

631.11 

265.89 

94.24 

0.00 

P2.5 

1081.01 

561.68 

60.14 

0.00 

CTCP2 

P2.6 

255.55 

766.24 

144.72 

0.00 

P2.7 

410.36 

1210.38 

229.07 

0.00 

P2.8 

123.03 

16.66 

16.55 

0.26 

P2.mix 

556.53 

643.70 

84.98 

0.10 

P2.9 

63.72 

1335.89 

1.12 

1.34 

P2.10 

542.69 

877.39 

34.08 

0.00 

P3.1 

209.45 

5986.06 

2.15 

0.00 

CTC  P3 

P3.2 

178.64 

970.22 

186.98 

0.00 

P3.3 

82.66 

4908.04 

182.96 

0.00 

P3.4 

801.58 

8957.21 

70.38 

0.00 

P5.1 

0.00 

40.42 

0.00 

0.17 

CTCP5 

P5.2 

266.27 

63.56 

11.84 

0.00 

P5.3 

110.42 

0.08 

80.16 

0.00 

N1 

5.53 

13232.71 

543.97 

2.98 

Normal 

N2 

4.65 

7547.27 

181.07 

2.67 

N3 

2.97 

11194.27 

341.45 

3.12 

WBC 

P2.WBC 

0.80 

2.20 

0.00 

20.69 

T24  (bladdercancer) 

T24.1 

0.02 

0.13 

0.05 

0.00 

F24.2 

0.01 

0.00 

0.00 

0.09 

Figure  4.  Prostate-specific  and  leukocyte-specific  genes.  (A)  For  each  gene,  FPKM  for  each  cell  is 
shown.  FPKM  values  greater  than  1  are  shaded  green.  AR,  KLK3,  and  TMPRSS2  are  markers  of 
prostate  tissue.  CD45  is  a  white  blood  cell  marker. 


12 


Reference  List 


1.  Chaffer,C.L.  and  Weinberg,R.A.  (2011).  A  perspective  on  cancer  cell  metastasis.  Science 
331, 1559-1564. 

2.  Cohen,S.J.,  Pimt,CJ.,  lannotti,N.,  Saidman,B.H.,  Sabbath, K.D.,  Gabrail,N.Y.,  Picus,J., 
Morse,M.,  Mitchell, E.,  Miller,M.C.,  Doyle,G.V.,  Tissing,H.,  Terstappen,L.W.,  and 
Meropol,N.J.  (2008).  Relationship  of  circulating  tumor  cells  to  tumor  response, 
progression-free  survival,  and  overall  survival  in  patients  with  metastatic  colorectal 
cancer.  J  Clin  Oncol  26,  3213-3221. 

3.  Cristofanilli,M.,  Budd,G.T.,  Ellis,M.J.,  Stopeck,A.,  Matera,J.,  Miller,M.C.,  Reuben,J.M., 
Doyle,G.V.,  Allard, W.J.,  Terstappen,L.W.,  and  Hayes,D.F.  (2004).  Circulating  tumor  cells, 
disease  progression,  and  survival  in  metastatic  breast  cancer.  N  Engl  J  Med  351,  781-791. 

4.  Danila, D.C.,  Pantel,K.,  Fleisher,M.,  and  Scher,H.I.  (2011).  Circulating  tumors  cells  as 
biomarkers:  progress  toward  biomarker  qualification.  Cancer  J  17,  438-450. 

5.  Paris,P.L.,  Kobayashi,Y.,  Zhao,Q.,  Zeng,W.,  Sridharan,S.,  Fan,T.,  Adler,H.L.,  Yera,E.R., 
Zarrabi,M.H.,  Zucker,S.,  Simko,J.,  Chen,W.T.,  and  Rosenberg,J.  (2009).  Functional 
phenotyping  and  genotyping  of  circulating  tumor  cells  from  patients  with  castration 
resistant  prostate  cancer.  Cancer  Lett.  277, 164-173. 

6.  Punnoose,E.A.,  Atwal,S.K.,  Spoerke,J.M.,  Savage,H.,  Pandita,A.,  Yeh,R.F.,  Pirzkall,A., 
Fine,B.M.,  Amler,L.C.,  Chen,D.S.,  and  Lackner,M.R.  (2010).  Molecular  biomarker  analyses 
using  circulating  tumor  cells.  PLoS.  One.  5,  el2517. 

7.  Racila,E.,  Euhus,D.,  Weiss,A.J.,  Rao,C.,  McConnell, J.,  Terstappen,L.W.,  and  Uhr,J.W.  (1998). 
Detection  and  characterization  of  carcinoma  cells  in  the  blood.  Proc.  Natl.  Acad.  Sci.  U.  S. 

A  95,  4589-4594. 

8.  Yu,  M.,  et  al..  Circulating  Breast  Tumor  Cells  Exhibit  Dynamic  Changes  in  Epithelial  and 
Mesenchymal  Composition.  Science,  2013.  339(6119):  p.  580-584. 


13 


OPEN  Q  ACCESS  Freely  available  online 


PLOS 


ONE 


mRNA-Seq  of  Single  Prostate  Cancer  Circulating  Tumor 
Cells  Reveals  Recapitulation  of  Gene  Expression  and 
Pathways  Found  in  Prostate  Cancer 

Gordon  M.  Cann1  Zulfiqar  G.  Gulzar2’,  Samantha  Cooper1,  Robin  Li1,  Shujun  Luo1,  Mai  Tat1, 

Sarah  Stuart1,  Gary  Schroth1,  Sandhya  Srinivas3,  Mostafa  Ronaghi1*,  James  D.  Brooks2", 

AmirAli  H.  Talasaz1* 

1  Department  of  Diagnostic  Research,  lllumina,  Inc.,  Hayward,  California,  United  States  of  America,  2  Department  of  Urology,  Stanford  University  Medical  Center,  Stanford, 
California,  United  States  of  America,  3  Department  of  Medicine,  Division  of  Oncology,  Stanford  University  Medical  Center,  Stanford,  California,  United  States  of  America 


Abstract 

Circulating  tumor  cells  (CTC)  mediate  metastatic  spread  of  many  solid  tumors  and  enumeration  of  CTCs  is  currently  used  as 
a  prognostic  indicator  of  survival  in  metastatic  prostate  cancer  patients.  Some  evidence  suggests  that  it  is  possible  to  derive 
additional  information  about  tumors  from  expression  analysis  of  CTCs,  but  the  technical  difficulty  of  isolating  and  analyzing 
individual  CTCs  has  limited  progress  in  this  area.  To  assess  the  ability  of  a  new  generation  of  MagSweeper  to  isolate  intact 
CTCs  for  downstream  analysis,  we  performed  mRNA-Seq  on  single  CTCs  isolated  from  the  blood  of  patients  with  metastatic 
prostate  cancer  and  on  single  prostate  cancer  cell  line  LNCaP  cells  spiked  into  the  blood  of  healthy  donors.  We  found  that 
the  MagSweeper  effectively  isolated  CTCs  with  a  capture  efficiency  that  matched  the  CellSearch  platform.  However,  unlike 
CellSearch,  the  MagSweeper  facilitates  isolation  of  individual  live  CTCs  without  contaminating  leukocytes.  Importantly, 
mRNA-Seq  analysis  showed  that  the  MagSweeper  isolation  process  did  not  have  a  discernible  impact  on  the  transcriptional 
profile  of  single  LNCaPs  isolated  from  spiked  human  blood,  suggesting  that  any  perturbations  caused  by  the  MagSweeper 
process  on  the  transcriptional  signature  of  isolated  cells  are  modest.  Although  the  RNA  from  patient  CTCs  showed  signs  of 
significant  degradation,  consistent  with  reports  of  short  half-lives  and  apoptosis  amongst  CTCs,  transcriptional  signatures  of 
prostate  tissue  and  of  cancer  were  readily  detectable  with  single  CTC  mRNA-Seq.  These  results  demonstrate  that  the 
MagSweeper  provides  access  to  intact  CTCs  and  that  these  CTCs  can  potentially  supply  clinically  relevant  information. 
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Introduction 

Circulating  tumor  cells  (CTC)  are  cells  that  part  from  a  primary 
tumor  or  metastasis  and  enter  the  blood  stream  via  the  leaky 
vasculature  that  arises  around  a  growing  tumor.  Once  in  the 
blood,  CTCs  encounter  damaging  stresses  associated  with 
hemodynamic  shear,  low  oxygen  conditions,  lack  of  anchorage 
sites,  and  immune  system  attack  [1,2].  A  small  number  of  CTCs 
survive  however  and  extravasate  into  surrounding  tissues  to  seed 
metastasis  or  reseed  the  primary  tumor  [3].  Described  over  a 
century  ago  [4],  CTCs  can  be  now  enumerated  using  the  FDA 
approved  CellSearch  platform  to  provide  prognostic  information 
regarding  survival  for  metastatic  breast,  colon  and  prostate  cancer 
patients  [5-7].  Moving  beyond  enumeration,  several  groups  have 
suggested  that  genetic  and  transcriptional  analysis  of  individual 
CTCs  might  be  leveraged  to  make  personalized  medical  decisions 
for  cancer  therapy  and  provide  insights  into  the  biological 
processes  involved  in  metastasis  [8-10], 


Several  methods  have  been  exploited  to  isolate  CTCs  from  red 
and  white  blood  cells  (WBCs).  Differentiating  physical  properties 
and  surface  markers  of  CTCs  have  been  utilized  for  their  isolation 
by  filtration  [11],  microfluidic  chip  [12,13],  buoyant  density 
centrifugation  [14],  immunomagnetic  selection  [15,16],  functional 
enrichment  and  detection  [17,18],  and  automated  immune 
microscopy  [19,20].  Immunomagnetic  enrichment  with  anti- 
EpCAM  beads  followed  by  fluorescence  activated  cell  sorting 
has  recently  been  shown  to  be  an  effective  approach  for  isolating 
CTCs  relatively  free  of  hematopoietic  cells  [21].  Of  the  platforms 
currently  in  use  for  isolating  CTCs,  the  MagSweeper  technology 
provides  great  ease  of  use  and  access  to  highly  pure,  intact, 
individual  CTCs  suitable  for  genetic  and  proteomic  analysis 
[22,23], 

CTCs  are  generally  present  in  low  numbers  in  patient  blood 
samples  (typically  1  per  10'  nucleated  cells  in  blood)  so  extracting 
maximal  information  from  single  or  available  CTCs  isolated  from 
a  patient’s  blood  sample  is  essential.  Next  generation  DNA 
sequencing  is  particularly  well  suited  for  deep  interrogation  of 
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cancer  genomes  and  transcriptomes  [24]  even  when  applied  at  the 
single  cell  level  [25].  In  this  study,  we  validated  the  performance  of 
a  new  generation  of  the  MagSweeper  using  spiked  LNCaP 
prostate  cancer  cells  in  normal  blood.  We  then  conducted  a 
capture  sensitivity  comparison  of  prostate  cancer  CTCs  between 
CellSearch  and  the  MagSweeper  on  replicate  patient  samples. 
Whole  transcriptome  sequencing  studies  of  single  LNCaP  cells 
revealed  that  MagSweeper  isolation  has  minimal  effects  on  gene 
expression.  Furthermore,  mRNA-Seq  mediated  transcriptome 
profiles  of  individual  prostate  CTCs  isolated  from  metastatic 
patient  blood  were  compared  to  normal  prostate  tissue  samples 
and  single  prostate  cancer  cell  lines.  Despite  cell  to  cell 
heterogeneity  and  a  wide  range  of  CTC  RNA  quality,  higher 
expression  of  prostate  related  genes  such  as  the  androgen  receptor 
(AR),  KLK3  (PSA)  and  TMPRSS2  could  be  distinguished  in 
prostate  CTCs.  Bioinformatic  screens  for  genes  expressed  100  fold 
higher  in  CTCs  compared  with  normal  prostate  samples  revealed 
other  known  gene  pathways  and  signatures  expected  of  prostate 
cancer  and  their  host’s  treatment  history. 

Materials  and  Methods 

Ethics  Statement 

This  study  was  reviewed  and  approved  by  Stanford’s  Human 
Subjects  Research  Compliance  Board  and  adhered  to  HIPPA 
regulations.  All  human  subjects  signed  informed  consent  prior  to 
blood  sample  collection. 

Patient  samples  and  blood  collection 

Patient  samples  were  collected  in  10  ml  EDTA  tubes  (Beckton 
Dickenson)  and  processed  within  12  hours  of  collection.  Samples 
were  collected  according  to  guidelines  specified  and  approved  by 
an  Institutional  Review  Board  and  after  informed  consent.  For 
comparisons  between  MagSweeper  and  CellSearch,  a  second 
7.5  ml  blood  sample  was  collected  in  a  CellSave  tube.  CellSearch 
assays  were  performed  by  Quest  diagnostics.  Total  RNA  from 
three  histologically  normal  prostate  tissues  were  obtained  from 
surgically  removed  prostates  under  a  separate  IRB-approved 
protocol. 

Cell  Culture  and  Cell  Spiking 

LNCaP,  PC-3,  and  T24  cells  were  purchased  and  cultured 
according  to  conditions  specified  by  the  American  Type  Culture 
Collection  (ATCC).  Following  dissociation  live  and  dead  cells  were 
determined  by  Trypan  blue  exclusion.  For  spiking,  cells  were 
diluted  to  approximately  3  x  1 03  cells  per  ml  and  cell  concentration 
verified  by  spotting  and  counting  six,  ten  microliter  aliquots  of  cells 
spotted  on  a  glass  microscope  slide.  No  correction  for  dead  cells 
based  on  Trypan  blue  exclusion  was  used  prior  to  spiking  cells. 

Bead  Binding  and  Cell  Surface  Staining 

Custom  1.5  um  Streptavidin  coated  magnetic  beads  were 
functionalized  with  a  custom  biotinylated  monoclonal  antibody 
directed  against  extracellular  human  EpCAM  epitope.  Two 
3.75  ml  volumes  of  blood  per  sample  were  subjected  to  red  blood 
cell  lysis  with  1 0  volumes  of  1  x  PharmLyse  (BD  Biosciences)  for 
15  minutes  at  room  temperature.  Remaining  cells  were  pelleted  at 
4°C  for  15  min  at  300  x  g.  Cell  pellets  were  transferred  with 
2x1  ml  aliquots  of  l%BSA/PBS/5  mM  EDTA  into  a  2  ml  flat 
bottom  microcentrifuge  tube  (VWR  International).  Cells  were 
then  pelleted  for  5  minutes  at  510x  g.  Cell  pellets  were 
resuspended  in  a  total  volume  of  1  ml  of  l%BSA/PBS/5  mM 
EDTA  containing  15  ul  of  Alexa  488  anti-human  CD45  (Life 
Technologies)  and  30  ul  of  our  custom  anti- EpCAM  beads. 


Samples  were  rotated  for  30  minutes  at  4°C  followed  by  addition 
of  20  ul  of  Phycoerythrin  (PE)  anti-human  EpCAM  monoclonal 
antibody  (BD  Biosciences  347198).  Samples  were  then  rotated  at 
4°C  for  an  additional  30  minutes  and  then  transferred  to  a  well  in 
a  6  well  plate  containing  6  ml  of  l%BSA/PBS/5  mM  EDTA. 
Samples  were  mixed  once  by  pipetting  up  and  down  in  a  10  ml 
pipette,  and  plates  were  spun  for  5  minutes  at  400  rpm,  followed 
by  incubation  for  15  minutes  at  4°C  prior  to  MagSweeper 
isolation.  In  some  spiking  experiments,  LNCaP  cells  were  labeled 
prior  to  spiking  with  CFDA  (Life  Technologies)  following 
manufacturer’ s  instructions . 

MagSweeper  and  Single  Cell  Isolation 

Putative  CTCs  were  isolated  using  two  rounds  of  MagSweeper 
isolation.  Cells  isolated  after  the  first  round  of  MagSweeping  were 
released  and  stained  in  a  well  of  a  6-well  plate  containing  500  nM 
membrane  impermeable  DAPI  (Life  Technologies)  so  that 
membrane  compromised  cells  could  be  identified.  Following  a 
second  round  of  MagSweeping  cells  were  dispersed  and  pelleted  at 
400  rpm  for  1  min  in  a  well  of  a  6-well  low  adhesion  plate  in  10% 
Superblock  (ThermoFisher)/PBS.  Wells  were  viewed  with  an 
Olympus  inverted  microscope  equipped  for  epiflouresence. 
Putative  CTCs  were  identified  as  cells  that  stained  positive  for 
PE  anti-EpCAM  and  negative  for  Alexa  488  anti-CD45.  Putative 
CTCs  that  were  DAPI+  were  excluded  from  further  analysis. 
DAPI  negative  putative  CTCs  were  isolated  in  1  ul  of  10% 
Superblock/PBS  with  a  pipetteman  into  a  0.2  ml  PCR  tube 
containing  2.5  ul  of  5%  Ribonuclease  Inhibitor  (Life  Technolo- 
gies)/0.2%  Triton  X-100  (10%  solution,  Sigma)  prepared  in 
nuclease  free  water.  Collected  cells  were  flash  frozen  on  dry  ice 
and  stored  at  — 80°C.  Cell  purity  was  measured  as  the  number  of 
spiked  cells  recovered  divided  by  the  number  of  spiked  cells 
recovered  plus  the  number  of  leukocytes. 

Single  Cell  mRNA-Seq 

Single  cells  were  lysed  and  RNA  reverse  transcribed  using  the 
SMARTer  Ultra  Low  Input  RNA  for  Ulumina  Sequencing  kit 
(Clontech).  cDNA  was  amplified  using  the  Advantage  2  PCR  kit 
(Clontech)  for  18-25  cycles  prior  to  conversion  into  a  Illumina 
compatible  DNA  sequencing  library  using  the  Nextera  DNA 
Sample  Prep  Kit  (Illumina)  and  12  cycles  of  PCR  to  amplify  the 
library.  Libraries  were  quantified  using  a  BioAnalyzer  (Agilent) 
and  qPCR  using  a  Kappa  Syber  Green  PCR  kit  (Kappa 
Biosciences)  on  an  Illumina  ECO  qPCR  machine.  Paired  end 
flow  cells  were  prepared  using  8  pM  of  Nextera  library  per  lane  on 
a  cBot  (Illumina)  and  sequenced  using  single  50  bp  reads  on  an 
Illumina  GAIIx  . 

Alignment 

Sequencing  data  was  collected  with  RTA  version  1 .9  and  fastq 
files  were  generated  with  Casava  1.8.  Reads  that  did  not  pass 
Illumina’s  standard  quality  filter  were  removed  by  default.  Reads 
were  aligned  to  hgl9  with  tophat  vl.3.3  and  counts  per  transcript 
were  calculated  for  hgl9  in  iGenomes  using  cufflinks  vl.l  with  the 
options:  -GTF  <genes.gtf>  -max-bundle-frags  20000000.  Gene- 
by-gene  raw  counts  and  fragments  per  kilobase  of  exon  per  million 
fragments  mapped  (FPKM)  were  generated  from  the  cufflinks 
isoform.fpkm_tracking  file  by  identifying  all  the  transcripts  with 
the  same  gene  name  and  taking,  respectively,  the  sum  of  coverage 
multiplied  by  transcript  length/ read  length  for  each  transcript  and 
the  sum  of  the  FPKM  for  each  transcript.  The  raw  counts  and 
FPKMs  were  used  in  all  the  downstream  analysis,  except  the  QC 
step. 
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Figure  1.  Metrics  of  MagSweeper  circulating  tumor  cell  (CTC)  isolation.  (A)  Image  of  hood  prototype  of  the  MagSweeper.  (B)  Percent 
capture  of  1 00  LNCaP  cells  spiked  into  3.75  ml  of  normal  blood  (N  =  54  experiments  and  1 1  donors).  Blue  circles  show  mean  percent  recoveries  of  live 
EpCAM  (+J/CD45  ( — )/DAPI( — )  cells  and  red  circles  show  mean  recoveries  of  membrane  compromised  EpCAM  (+)/CD45  (— )/DAPI(+)  cells.  Error  bars 
represent  +/— 1  S.D.  (C)  Percentage  capture  and  purity  of  10  LNCaP  cells  isolated  following  spiking  into  7.5  ml  of  normal  blood.  Blue  bars  are  the 
mean  percent  recovery  of  cells  after  MagSweeper  isolation  (Cell  Capture)  and  pick  and  manual  place  single  cell  isolation  (Cell  Isolation)  while  purple 
bars  show  purity  of  LNCaP  cells  after  MagSweeper  and  single  cell  isolation  (N  =  6  experiments  and  4  donors).  Cell  purity  was  calculated  as  the  number 
of  spiked  cells  isolated  divided  by  the  number  of  spiked  cells  counted  plus  white  blood  cells  counted.  Error  bars  represent  +/— 1  S.D.  (D)  Bright  field 
(BF)  and  images  of  fluorescently  stained  CTCs  isolated  from  a  prostate  cancer  patient  blood  sample,  and  contaminating  WBC  found  after 
MagSweeper  isolation.  Scale  bar  =  20  microns.  (E)  MagSweeper  versus  CellSearch  comparison  of  patient  samples.  Samples  with  0  CTC  were  assigned 
a  value  of  1  for  plotting  purposes. 
doi:10.1371/journal.pone.0049144.g001 


RNA-Seq  Data  Quality  Control 

Quality  control  (QC)  was  done  using  an  internal  Illumina  RNA- 
Seq  QC  script.  Base-by-base  coverage  was  calculated  across  a 
hand-picked  set  of  600  quality  control  genes  that  are  chosen  from 
RefSeq  and  are  highly  mappable  and  highly  abundant  in  universal 
human  RNA  (UHR)  samples  (Agilent)  .  Highly  mappable  means 
that  >90%  of  bases  for  a  selected  transcript  have  100% 
mappability.  High  expression  in  UHR  means  that  only  genes 
with  average  coverage  >  1  x  were  selected.  So  for  a  given 
transcript  1000  bp  in  length  there  are  at  least  20  reads  of  50  bp 
mapped  to  it.  Genes  in  this  set  with  greater  than  1  FPKM  were 
used  to  calculate  additional  statistics.  The  median  coverage  across 
all  the  length-normalized  QC  genes  with  greater  than  1  FPKM 
was  plotted.  For  each  quality  control  gene  with  greater  than  1 
FPKM,  the  coefficient  of  variation  was  calculated  across  the  gene, 
and  the  median  of  all  the  CVs  was  determined. 

Unsupervised  Clustering 

Unsupervised  hierarchical  clustering  [26]  was  done  with  the 
heatmap  function  in  R,  with  the  default  Euclidean  distance  used  as 
the  dissimilarity  metric.  For  each  sample,  the  100  genes  with  the 


highest  FPKM  were  selected  and  the  resulting  pool  of  3 1 2  genes 
was  used  in  the  clustering. 

LNCaP  expression  analysis 

EdgeR  [27]  was  run  using  moderated  tagwise  dispersions  with 
the  raw  counts  per  gene  as  input.  Correlation  between  samples 
was  calculated  based  on  the  raw  number  of  reads  mapped  to  each 
gene. 

Genes  over-expressed  in  CTCs 

Due  to  the  inherent  variability  in  single  cell  data,  coupled  with 
the  varying  degrees  of  apparent  rnRNA  decay  observed  in  the 
CTCs,  a  simple  thresholding  method  was  used  to  identify  over¬ 
expressed  genes.  For  each  gene  the  ratio  of  the  2nd  highest  FPKM 
value  among  the  set  of  CTCs  to  the  FPKM  in  the  normal  prostate 
RNA  was  calculated.  Genes  with  a  ratio  of  at  least  1 00  x  and  at 
least  10  total  reads  in  one  of  the  CTCs  were  selected.  GO 
ontologies  were  generated  using  the  Panther  Classification  System 
(http://www.pantherdb.org/)  [28]. 
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Results 

MagSweeper  metrics  for  CTC  isolation 

A  new  prototype  of  the  Magsweeper  [22]  was  developed  that  is 
compatible  with  operation  in  most  biosafety  cabinets  (Figure  1A). 

To  assess  the  performance  of  this  platform,  over  an  1 1  month 
period,  100  LNCaP  cells  were  repeatedly  spiked  into  3.75  ml  of 
normal  blood  and  isolated  with  the  MagSweeper  (n  =  54 
experiments  and  1 1  blood  donors).  Prior  to  spiking,  LNCaP  cell 
viability  measured  by  Trypan  blue  exclusion  was  89% ±8%. 
During  isolation  from  spiked  blood  LNCaP  cells  were  fluorescently 
labeled  with  antibodies  against  EpCAM  and  CD45  to  distinguish 
LNCaP  cells  from  WBCs,  and  the  membrane  impermeable 
nuclear  stain  DAPI  was  used  to  identify  dead  (membrane 
compromised)  cells.  Post-isolation  we  recovered  a  mean  of 
81  %  ±  16%  of  live  spiked  LNCaP  cells  (EpCAM+/CD45  -  / 
DAPI—)  (Figure  IB,  blue  line)  while  DAPI  staining  revealed  an 
additional  12% ±6%  of  the  isolated  LNCaP  cells  that  were 
membrane  compromised  (EpC AM+/CD45  —  /DAPI+).  Since  the 
original  cell  spike  contained  1 1  %  dead  cells  (measured  by  Trypan 
blue  exclusion)  recovery  of  12%  DAPI  positive  cells  following 
MagSweeping  indicates  that  damage  to  spiked  LNCaP  cells  during 
the  entire  procedure  is  minimal.  Normal,  unspiked  blood  samples 
failed  to  yield  EpCAM  positive  cells  (data  not  shown). 

To  assess  purity  of  ultra-rare  cells  after  MagSweeping,  7.5  ml  of 
normal  donor  blood  was  spiked  with  1 0  CFD A  labeled  cells  (n  =  6 
experiments  and  4  donors)  and  processed  according  to  our 
protocol  for  CTC  isolation.  Captured  LNCaP  cells  were  identified 
by  fluorescent  detection  of  CFDA,  while  contaminating  WBCs 
were  identified  by  positive  CD45  staining.  Following  magnetic 
sorting  a  mean  percentage  LNCaP  recovery  of  63% ±25%  was 
observed.  Enumeration  of  CD45  positive  cells  yielded  an  initial 
purity  of  isolated  LNCaP  cells  after  MagSweeper  cell  isolation  of 
10%±6%  and  100%  post  single-cell  isolation  using  a  pick  and 
place  method  (Fig.  1C). 

We  next  performed  a  comparative  analysis  of  MagSweeper 
versus  CellSearch  in  enumeration  of  CTC  in  blood  samples  from 
patients  with  metastatic  prostate  cancer.  At  the  time  of  draw,  two 
7.5  ml  samples  of  blood  were  collected,  one  sample  in  a  standard 
EDTA  tube  for  MagSweeper  cell  isolation  and  a  second  in  a 
CellSave  tube  for  analysis  by  an  independent  lab  (Quest 
Diagnostics)  using  the  CellSearch  assay.  Immunofluorescent 
staining  of  MagSweeper  isolated  cells  identified  CTCs  as  EpCAM 
positive  and  CD45  negative.  CTCs  were  easily  distinguished  from 
WBC  which  were  CD45  positive,  EpCAM  negative  (Fig.  ID). 
Numbers  of  completely  purified  CTCs  enumerated  by  MagSwee- 
per  isolation  and  CellSearch  were  compared  and  found  to  be 
reasonably  similar,  with  a  mild  trend  observed  toward  better  CTC 
capture  using  the  MagSweeper  in  samples  with  low  CTC  numbers 
(Figure  IE). 

MagSweeper  isolation  has  minimal  effects  on  single  cell 
transcriptomes 

To  assess  whether  the  MagSweeper  isolation  process  affected 
the  global  transcriptional  profile  of  isolated  cells,  we  performed 
single  cell  mRNA-Seq  on  4  fresh  LNCaP  cells  just  prior  to  spiking 
into  blood,  and  on  4  LNCaPs  after  MagSweeper  isolation  from 
spiked  normal  blood.  Isolated  cells  were  stored  frozen  for  at  least  a 
month  to  simulate  storage  conditions.  BioAnalyzer  traces  of 
amplified  cDNA  from  a  fresh  and  a  MagSweeper  isolated  cell 
revealed  similar  molecular  weight  peaks  of  amplification  products 
centered  at  approximately  1000  base  pairs  (Fig.  2A). 

To  study  genes  differentially  expressed  between  fresh  and 
MagSweeper  isolated  single  cells,  we  first  used  the  Bioconductor 

PLOS  ONE  |  www.plosone.org  4 


edgeR  package.  We  identified  only  1  gene  as  differentially 
expressed  between  MagSweeper-isolated  and  control  LNCaP  cells 
at  a  false  discovery  rate  (FDR)  cutoff  of  0.05  and  none  at  a  cutoff 
of  0.01. 

We  also  explored  several  other  methods  for  identifying 
differences  between  the  fresh  and  MagSweeper  isolated  cells. 
First,  to  characterize  the  degree  of  cell-to-cell  variability,  we  asked 
how  many  genes  had  a  very  high  FPKM  (>  1 0)  in  at  least  1  sample 
and  a  veiy  low  FPKM  (<0.1)  in  at  least  one  sample.  We 
considered  three  different  groups  of  samples:  (1)  4  fresh  cells  (2)  4 
MagSwept  cells  and  (3)  2  fresh  and  2  MagSwept  cells.  The  number 
of  highly  variable  genes  in  four  fresh  cells  (215),  four  MagSwept 
cells  (268)  and  a  combination  of  2  fresh  and  2  MagSwept  cells 
(239)  was  similar  in  all  groups  and  likely  reflects  cell  to  cell 
heterogeneity.  In  another  comparison  we  calculated  the  correla¬ 
tion  of  the  number  of  reads  per  gene  between  each  pair  of  samples, 
and  found  that  the  within-group  correlations  were  no  stronger 
than  die  between  group  correlations  -  that  is,  the  cells  did  not 
cluster  based  on  isolation  method  (Figure  2B).  Finally  in  a  separate 
study,  we  ran  an  Illumina  expression  microarray  on  pools  of 
10,000  LNCaP  cells  pre-  and  post-MagSweeper  isolation.  Again, 
there  was  high  cross-correlation  (R"  =  0.985)  between  MagSwee- 
per  isolated  cells  and  controls  indicating  that  MagSweeper 
produces  minimal  alterations  in  gene  expression  (data  not  shown). 

mRNA-Seq  of  single  prostate  cancer  CTCs 

We  prepared  amplified  cDNA  from  67  CTCs  isolated  from  13 
prostate  cancer  patients.  CTCs  were  isolated  after  MagSweeping 
based  on  immunofluorescent  staining  for  cells  that  were 
(EpCAM+/ CD45  —  /DAPI  — ).  Unlike  cultured  LNCaP  cells 
(Figure  2A),  we  found  that  there  was  a  wide  range  of  sizes  of 
amplified  cDNAs  in  patient-derived  CTCs,  reflective  of  initial 
RNA  quality.  We  characterized  traces  of  amplification  products 
into  3  groups:  1)  those  with  peaks  centered  around  1000  bps  (good 
quality),  2)  traces  with  intermediate  lengdi  amplification  products 
(partially  degraded),  and  3)  traces  with  predominantly  low 
molecular  weight  amplification  products  (degraded)  (Fig.  2C). 
Looking  across  all  67  CTCs,  21%  had  good  quality  RNA,  37% 
were  partially  degraded,  and  42%  of  the  samples  were  degraded 
(Fig.  2D).  RNA  quality  tended  to  be  somewhat  patient  specific  -  for 
example  patient  2  yielded  the  highest  number  (8/12)  of  good 
quality  RNA  samples.  Using  RNA  quality  as  a  guide,  we 
sequenced  the  libraries  of  24  CTCs  which  included  representatives 
of  all  three  RNA  quality  classifications,  and  aligned  sequences  to 
the  human  genome  (build  hg  1 9) .  Based  on  sequence  alignment 
score  of  greater  than  five  percent,  sequence  data  for  20  CTCs 
collected  from  4  patients  (PI,  P2,  P3  and  P4)  were  selected  for 
further  in  depth  study  (Fig.  2E). 

mRNA-Seq  data  quality 

To  assess  the  mRNA-Seq  data  quality,  coverage  was  calculated 
using  a  quality  control  script  and  a  handpicked  set  of  600  quality 
control  genes  that  are  highly  mappable  and  expressed  in  universal 
human  RNA  (see  Methods,  RNA-Seq  Data  Quality  Control  for 
definition).  The  sequencing  data  from  the  single  LNCaP,  PC-3, 
and  T24  cells  passed  the  quality  control  standards  for  >60% 
alignment  and  <65%  median  alignment  CV  typically  applied  to 
large  RNA  input  mRNA-Seq  data  sets  (Fig.  3A).  In  contrast, 
CTCs  displayed  higher  coverage  median  CVs  and  lower 
percentage  of  alignments  than  cultured  cells.  Typical  coverage 
plots  for  cell  lines,  normal  prostate  tissues,  and  CTCs  are  shown  in 
Figure  3B.  The  cell  lines  displayed  smooth  coverage  across  the 
length  of  the  transcript,  while  the  normal  prostate  samples  had  a 
slight  3'  bias,  typical  for  tissue  samples  (Fig.  3B).  The  CTCs  had  a 
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Figure  2.  MagSweeper  isolation  has  minimal  effects  on  single  cell  transcriptomes.  (A)  Bioanalyzer  traces  of  amplified  cDNAs  from  single 
LNCaP  cells  pre  (Single  LNCaP(pre))  and  post  MagSweeping  (Single  LNCaP  (post)),  and  positive  control  (12  pg  of  LNCaP  total  RNA)  and  negative 
control  (Negative  control).  (B)  Heatmap  of  correlations  between  fresh  and  MagSwept  single-cell  RNA-Seq  data  and  table  of  correlations  between  fresh 
and  MagSwept  samples.  Yellow  indicates  higher  correlations  and  red  lower  correlations.  (C)  Representative  bioanalyzer  traces  of  good,  intermediate, 
and  poor  CTC  cDNA  amplification  products.  (D)  Percent  breakout  of  CTC  RNA  quality  based  on  classification  of  cDNA  amplification  products  -  green 
indicates  good  quality,  yellow  samples  are  partially  degraded  RNA  and  red  indicates  degraded  RNA  samples.  (E)  Sequenced  CTCs,  their  RNA  quality 
and  %  alignment  of  passing  filter  mRNA-Seq  reads  to  the  human  genome  build  hg19.  Patient  CTC  ID  indicates  single  patient  CTCs  identified  as 
patient  number.  CTC  number  (P1.1).  RNA  Quality  is  based  on  bioanalyzer  traces  of  amplified  cDNA  and  Align%  is  alignment  %  of  mRNA-Seq  reads. 
doi:10.1371/journal.pone.0049144.g002 


wide  range  of  coverage  bias,  as  shown.  Since  oligo-dT  was  used  to 
prime  cDNA  synthesis,  a  3-prime  bias  in  the  coverage  data 
suggests  mRNA  degradation. 

mRNA-Seq  data  content 

To  understand  transcript  abundance,  variation  and  range  in 
CTCs,  we  compared  the  distributions  of  FPKM  values  of  all 
RefSeq  RNAs  detected  in  CTCs  and  LNCaP  cells  (Tables  SI  and 
S2).  Measurements  of  RefSeq  transcripts  with  £10  FPKM 
revealed  in  LNCaP  cells  (n  =  4)  4622±  136.2  transcripts  (with  a 
range  of  4485  to  4786  transcripts).  In  contrast,  in  CTCs  (n  =  20) 
the  number  of  RefSeq  transcripts  with  >10  FPKM  was  2362  ±865 
transcripts  (with  a  range  of  1233  to  3987  transcripts). 

Next,  we  looked  at  expression  levels  of  several  prostate  makers 
and  a  leukocyte  marker  to  establish  that  the  patient-derived  cells 
were  of  prostatic  origin:  androgen  receptor  (AR),  prostate-specific 
antigen  (PSA,  KLK3),  TMPRSS2,  and  the  leukocyte  marker 
CD45  in  all  CTC  samples.  Figure  4A  shows  the  FPKM  of  each  of 
these  markers  in  patient  CTCs,  tissue  culture  cell  lines  and  normal 


prostate,  with  values  of  >  1  FPKM  shaded  in  green.  All  but  one  of 
the  CTCs  was  positive  for  at  least  one  of  the  prostate  markers. 
LNCaP  and  normal  prostate  showed  expected  expression  of  these 
prostate  markers.  PC-3  lacked  KLK3  and  AR  expressed,  as 
expected  [29]  but  did  express  TMPRSS2.  Importantly,  all  the 
CTCs  and  the  cell  lines  were  negative  for  the  WBC  marker  CD45. 
Normal  prostate,  which  is  comprised  of  many  cell  types  including 
WBCs,  was  positive  for  CD45  expression  as  was  the  single  WBC 
that  was  subjected  to  mRNA-Seq.  Finally,  T24,  a  bladder  cancer 
cell  line  did  not  express  any  of  these  markers.  To  understand  how 
patient  CTCs  relate  to  one  another,  we  performed  an  unsuper¬ 
vised  clustering  analysis  of  all  patient  CTCs.  The  analysis  revealed 
that  with  the  exception  of  two  CTCs  (P2.9  and  PI. 2),  all  CTCs 
from  individual  patients  clustered  in  a  patient  specific  manner 
(Fig.  4B). 

CTC  Pathway  Analysis 

T o  find  genes  and  pathways  that  were  differentially  activated  in 
CTCs  we  compared  transcript  profiles  of  CTCs  to  those  from 
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Figure  3.  Alignment  metrics  of  human  prostate  CTC  mRNA-Seq  sequences.  (A)  Percentage  of  passing  filter  (PF)  reads  that  aligned, 
percentage  of  alignments  that  map  to  coding  regions,  median  coverage  CV,  and  percentage  of  reads  that  map  to  the  five  genes  with  the  highest 
number  of  mapped  reads.  To  calculate  the  "Median  CV",  first  the  CV  of  coverage  was  calculated  across  each  of  600  genes  in  a  hand-picked  set  of 
quality  control  genes  expressed  in  universal  human  RNA  (Agilent),  and  then  the  median  of  these  CVs  is  taken  (considering  only  the  genes  with  at 
least  1  x  coverage).  The  "%  aligned"  is  the  percentage  of  PF  reads  that  align  to  the  genome  or  to  a  splice  site,  excluding  mitochondria  and  ribosomal 
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normal  prostate  tissue  and  focused  on  genes  that  were  over 
expressed  in  the  CTGs.  To  normalize  the  RNA-Seq  data  based  on 
gene  sizes  and  number  of  mapped  fragments,  we  used  tophat  and 
cufflinks  to  determine  FPKM.  We  reasoned  that  the  varying 
degrees  of  RNA  degradation  in  the  CTCs  would  lead  to  false¬ 
positive  counting  of  under  expressed  genes  in  the  CTCs,  especially 
since  the  half-life  of  RNA  varies  from  gene  to  gene.  We  used  a 
manual  thresholding  method  to  identify  genes  that  were  overex¬ 
pressed  in  CTCs.  Specifically,  we  selected  genes  that  were  at  least 
100-fold  higher  in  at  least  2  of  the  CTCs  compared  to  normal 
prostate  tissue  and  that  contained  at  least  10  mapped  reads  in  at 
least  one  sample. 

We  identified  181  genes  over-expressed  in  CTCs  compared  to 
normal  prostate  tissue  (Table  S3).  To  gain  an  overview  of  the 
range  of  biological  functions  associated  with  these  transcripts, 
Gene  Ontology  annotations  we  derived  from  the  GoSlini  database 
using  the  Panther  Classification  System  browser  [28]  (Fig.  4C)  and 
categorized  for  biological  processes.  Out  of  1 8 1  genes,  1 1 0  yielded 
173  process  hits  which  were  classified  into  14  biological  processes. 
These  were  displayed  using  the  Panther  Pie  Cart  feature  (Fig.  4C). 
Among  the  remaining  7 1  gene  annotations  not  classified  by 


GoSlim,  37  were  non-coding  RNAs  including  members  of  the 
MIR,  SCARNA,  SNAR,  SNORA,SNORD  and  VTRNA  fami¬ 
lies.  The  remaining  34  transcripts  could  be  identified  using 
GeneCards.  Examination  of  the  transcripts  classified  by  biological 
processes  revealed  that  one  third  were  associated  with  either 
metabolic  processes  (23%,  G0:0008152)  or  the  cell  cycle  (10%, 
G0:0007049),  consistent  with  mitotically  active  cells  (Fig.  4C). 
Cell  cycle  and  mitosis  associated  transcripts  in  the  highly  expressed 
gene  set  including  TPX2,  CCNA2,  CCNB1  and  B2,  CDC20, 
CSK2,  CDC2,  CDKN3,  CENPE,  CD28  ,TOP2A,  ORC1L, 
NUF2,  CDK1,  KIF2C,  PTTG1  and  TTK. 

Interestingly,  the  list  contains  many  transcripts  germane  to 
prostate  cancer  biology.  For  instance,  all  3  CTCs  from  patient  1 
expressed  high  levels  of  SPINK  1,  a  transcript  and  protein 
identified  as  elevated  in  TMPRSS2-ERG  fusion  negative  prostate 
cancers  and  associated  with  aggressive  prostate  cancer  [30].  CTCs 
front  patients  1  and  2  expressed  high  levels  of  BIRC5  (Survivin), 
an  anti-apoptotic  gene  expressed  at  high  levels  in  castration- 
resistant  prostate  cancer.  CTCs  also  expressed  cancer  associated 
transcripts  (BAGE,  BAGE3,  CT45A1,  CT45A4,  CT45A5, 
CT45A6,  CTAG18,  CTAG2,  MAGEA12,  MAGEA1,  MAGEA3, 
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Figure  4.  Expression,  clustering,  and  functional  classification  of  genes  expressed  in  human  prostate  CTCs.  (A)  Expression  of  prostate 
cancer  associated  genes.  For  each  gene,  fragments  per  kilobase  of  exon  per  million  fragments  mapped  (FPKM)  for  each  CTC  and  controls  are  shown. 
FPKM  values  greater  than  5  are  shaded  green  and  those  with  values  between  1  and  5  are  shaded  light  green.  AR  (androgen  receptor),  KLK3  (prostate 
specific  antigen),  and  TMPRSS2are  markers  of  prostate  tissue.  CD45  is  a  white  blood  cell  marker.  Prostate  cancer  cell  lines  include  LNCaP  and  PC-3 
while  T24  is  a  bladder  cancer,  and  WBC  is  a  single  white  blood  cell.  Normal  denotes  normal  prostate  tissue.  (B)  Unsupervised  clustering  of  over¬ 
expressed  genes  in  patient  CTCs.  Colored  bars  across  the  top  of  the  figure  indicate  different  patients  while  individual  patient  CTCs  are  listed  at  the 
bottom  of  the  cluster.  (C)  Functional  classification  of  genes  overexpressed  in  CTC  using  Gene  Ontology  (GO)  classifications.  For  each  functional 
grouping  the  %  of  genes  over-expressed  in  each  GO  category  is  indicated. 
doi:10.1371/journal.pone.0049144.g004 
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MAGEA6,  MAGEC1,  MAGEC2,  and  PTTG1)  and  transcripts 
important  in  regulating  development  (HOXB7,  HOXB8, 
HOXB9,  NANOGNB,  and  LOG404266).  Notably  4  transcripts 
(TOP2A,  TK1,  TPX2  and  KIAA0101)  expressed  at  high  levels  in 
CTCs  were  found  in  a  list  of  31  transcripts  associated  with  disease 
recurrence  after  radical  prostatectomy  we  published  recently  [31]. 

Using  Ingenuity  Pathway  Analysis  (IPA),  we  looked  for 
overrepresented  pathways  and  gene  sets.  The  top  Diseases  and 
Disorders  was  Cancer  and  the  top  Canonical  Pathway  was  “Cell 
Cycle:  G2/M  DNA  Damage  Checkpoint  Regulation.  Looking 
specifically  at  genes  overlapping  the  IPA  function  Prostate  Cancer ,  9 
genes  were  identified:  AR,  TK1,  PLK1,  MAGEA1,  MAGEC1, 
MAGEC2,  CTAGB1,  BIRC5,  and  TOP2A.  The  CTCs  from 
patient  2  contributed  most  significantly  to  these  results.  Repeating 
these  analyses  with  only  genes  identified  in  P2  verses  normal 
prostate  tissue  produced  very  similar  results  with  highly  significant 
p-values.  Interestingly,  excluding  P2  from  the  analysis  yields  a 
weaker  but  still  significant  association  with  prostate  cancer.  This  is 
consistent  with  the  observation  that  the  CTCs  from  P2  yielded 
significantly  better  quality  libraries. 

Discussion 

We  have  produced  a  new  generation  of  MagSweeper  which 
employs  more  sophisticated  cell  capture  hardware  and  software 
than  a  previous  version  [22],  and  has  a  reduced  footprint 
compatible  with  operation  in  most  biosafety  cabinets.  These 
improvements  combined  with  a  multi-marker  staining  protocol 
allow  the  user  to  distinguish  CTCs  by  fluorescent  staining  of  cell 
surface  markers.  Validation  of  MagSweeper  performance  revealed 
that  the  mean  capture  of  live  LNCaP  cells  spiked  into  blood  is 
81%  ±16%  which  is  comparable  with  the  capture  reported  for 
high  EpCAM  expressing  epithelial  cancer  cell  lines  spiked  into 
blood  on  other  CTC  capture  platforms  [32].  In  a  comparative 
enumeration  study  with  the  CellSearch  platform  using  prostate 
patient  samples,  MagSweeper  allowed  enumeration  of  comparable 
numbers  of  CTCs  with  a  slightly  better  recovery  of  CTCs  from 
patient  samples  with  low  starting  numbers  of  CTCs  (Fig.  IE). 
However,  unlike  other  CTC  isolation  technologies  (CellSearch 
cartridges,  and  CTC  and  OncoCEE  chips)  MagSweeper  technol¬ 
ogy  allows  isolation  and  characterization  of  single  CTCs,  rather 
than  pooled  CTCs  that  are  contaminated  with  variable  numbers 
of  WBCs  (Fig.  1C,  Cell  Isolation,  Fig.  4A).  Furthermore,  inclusion 
of  DAPI  as  a  dead  cell  exclusion  marker  allows  discrimination  of 
intact  CTCs  from  damaged  CTCs  and  CTC  fragments  that  have 
been  observed  by  several  groups  using  the  CellSearch  [33],  and 
other  automated  microscopy  platforms  [19,34],  Although  we  have 
used  EpCAM  based  capture  and  cell  surface  staining  to  isolate  and 
identify  single  prostate  cancer  CTCs,  combinations  of  capture  and 
staining  antibodies  can  be  easily  reconfigured  for  use  with  the 
MagSweeper  to  isolate  CTCs  from  other  malignancies  or  to  isolate 
other  cell  types.  Finally,  single  cell  isolation  using  the  MagSweeper 
does  not  appreciably  alter  gene  expression.  While  single  cells  pre 
and  post-MagSweeper  isolation  showed  expected  heterogeneity  in 
transcriptome  expression  patterns  from  cell  to  cell,  we  were  unable 
to  find  patterns  of  gene  expression  that  were  correlated  with 
MagSweeper  processing.  This  finding  suggests  that  cell  autono¬ 
mous  as  opposed  to  extrinsic  factors  such  as  MagSweeper  isolation 
govern  gene  expression  at  the  level  of  the  single  cell  (Fig.  2B). 

The  MagSweeper  isolation  protocol  appears  to  be  a  relatively 
gentle  method  for  isolating  CTCs,  with  a  mean  cell  attrition  rate 
for  spiked  LNCaP  of  1%.  Furthermore,  dead  and  membrane 
compromised  cells  are  stained  using  DAPI  allowing  identification 
and  isolation  of  live,  membrane  intact  cells,  and  those  cells  most 
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likely  to  yield  intact  DNA  and  RNA.  With  this  protocol  single 
LNCaP,  PC-3  and  T24  cells  showed  high  quality  RNA  after 
amplification  as  judged  by  size  of  amplified  cDNA  (Fig.  2A)  and 
passing  mRNA-Seq  quality  control  metrics  for  alignment,  median 
alignment  CV  and  positional  coverage  in  a  handpicked  panel  of  600 
quality  control  genes  (Fig.  3 A  and  3B).  Therefore,  we  suspect  that  the 
heterogeneity  in  RNA  quality  present  in  CTCs  isolated  from  patient 
samples  (Fig.  2C,  3A  and  3B)  is  due  to  features  of  CTC  biology  in  vivo 
and  not  due  to  technical  features  of  MagSweeper  CTC  isolation. 
Patients  with  the  highest  numbers  of  CTCs  tended  to  yield  CTCs 
with  better  RNA  quality.  Since  all  patients  in  this  study  were  on 
therapy  (Table  S4),  differences  in  RNA  quality  are  likely  related  to 
treatment  effects  or  host  factors  that  affect  CTC  viability  and 
apoptosis.  Degradation  of  mRNA  is  an  early  event  in  apoptosis, 
possibly  upstream  of  caspase  activation  [35],  and  therefore  might 
occur  in  dying  cells  that  are  physically  intact  as  judged  by  lack  of 
DAPI  staining.  Accumulating  evidence  suggests  that  apoptotic  CTCs 
are  routinely  isolated  from  cancer  patient  blood  samples.  Using  the 
CellSeach  platform,  FISH,  and  flow  cytometry  in  conjunction  with 
the  M30  antibody  which  recognizes  caspase  cleaved  CK18,  several 
groups  have  shown  that  a  significant  number  of  CTCs  isolated  from 
metastatic  prostate  cancer  patients  are  apoptotic  [34,36-38]. 
Detection  of  apoptotic  CTCs  is  not  platform  or  cancer  type  specific. 
Using  fiber  optic  array  scanning  technology  and  automated 
fluorescence  imaging,  many  CTCs  isolated  from  metastatic  breast 
cancer  patients  are  apoptotic  morphologically  and  stain  positively  in 
fluorescent  TUNEL  assays  [19,33]. 

Despite  heterogeneity  of  CTC  RNA  quality,  we  were  able  to 
perform  single  cell  transcriptome  analysis  and  confirm  that  the 
CTCs  were  prostatic  in  origin.  Expression  of  the  androgen  receptor 
and  target  downstream  genes  such  as  KLK3  and  TMPRSS2  in  all 
but  one  of  the  CTCs  identifies  them  as  being  of  prostate  origin 
(Fig.  4A).  Furthermore,  none  of  the  CTCs,  or  single  LNCAP,  PC-3 
and  T24  cell  lines  sequenced  expressed  CD45,  a  marker  for  WBCs. 
Larger  studies  of  isolated  CTCs  will  be  necessary  to  understand  the 
degree  of  cell-to-cell  heterogeneity  in  gene  expression  as  well  as  the 
effects  of  RNA  quality  on  the  fidelity  of  transcript  levels  measured 
by  whole  transcriptome  sequencing  in  single  cells. 

Pathway  analysis  confirmed  activation  of  androgen  receptor 
(AR)  signaling  pathways,  which  are  known  to  be  central  to  prostate 
cancer  biology.  In  addition  many  genes  associated  with  cell  cycle 
regulation  and  mitotic  spindle  are  up  regulated  in  the  CTCs. 
While  high  levels  of  expression  of  these  transcripts  might  be 
expected  in  malignant  cells,  it  is  notable  that  CTCs  showing  high 
levels  of  expression  of  spindle-associated  transcripts  were  derived 
from  patients  on  taxane  chemotherapy.  Taxanes  target  the  mitotic 
spindle  so  it  is  possible  that  these  transcripts  were  up-regulated  in 
response  to  chemotherapy  and  that  this  up-regulation  could 
mediate  response  or  resistance  to  taxane  chemotherapy.  Several  of 
the  transcripts  identified  in  the  CTCs  have  been  correlated  with 
aggressive  behavior  in  localized  prostate  cancer  (e.g.  PLK-1, 
TOP2A)  so  it  is  interesting  to  observe  these  markers  in  cells  from 
patients  with  highly  advanced  prostate  cancers  [39,40].  MAGE  A1 
and  CTAG 1 B  show  a  complex  pattern  of  expression  in  samples  of 
prostate  carcinomas  [41],  MAGE  C2  is  expressed  in  a  small 
percentage  of  primary  prostate  cancers  with  more  frequent 
expression  found  in  metastatic  and  castration  resistant  cancer 
[42].  Finally,  it  might  be  possible  to  use  CTCs  to  identify 
therapeutic  targets  for  advanced  prostate  cancer.  For  example, 
CTCs  from  2  patients  expressed  high  levels  of  BIRG5  transcripts. 
BIRC5  encodes  the  bi-functional  protein  survivin  which  has  both 
anti-apoptotic  and  mitotic  functions  in  a  cell  [43].  Survivin  has 
been  implicated  in  castrate  resistant  prostate  cancer  and  thera¬ 
peutic  antisense  RNA  to  survivin  shows  effectiveness  in  treating 
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castrate  resistant  prostate  cancer  [44],  In  patient  1,  SPINK  1  was 
identified  in  all  3  CTCs  analyzed,  and  accumulating  evidence 
suggests  this  defines  a  subtype  of  prostate  cancer  that  is  susceptible 
to  therapies  targeting  SPINK  1  or  EGFR  (cetuximab)  [30]. 

Previous  molecular  characterizations  of  CTCs  target  a  single  or  a 
few  disease-associated  biomarkers.  For  example,  pooled  CTCs  have 
been  assayed  for  HER2  in  breast  cancer  using  FISH  [32], 
TMPRRSS2-ERG  rearrangements  in  prostate  cancer  using  RT- 
PCR  [8]  and  EGFR  mutations  in  non-small  cell  lung  cancer  [45]. 
Microarray-based  assessments  of  gene  expression  have  been  carried 
out  on  pools  of  prostate,  breast  and  colon  cancer  CTCs  [46]  and  whole 
genome  amplification  coupled  with  array  comparative  genomic 
hybridization  has  been  used  to  look  at  copy  number  variation  in  small 
pools  of  prostate  CTCs  [21].  With  the  exception  of  FISH,  no  previous 
technology  has  allowed  assessment  of  gene  expression  of  single  CTCs, 
and  pooling  of  samples  could  obscure  cell-to-cell  variations  in 
expression  that  are  biologically  interesting  and  important.  Our 
demonstration  that  mRNA-Seq  can  be  carried  out  on  single  CTCs, 
and  development  of  a  platform  that  allows  isolation  of  highly  pure 
individual  CTCs  offers  an  opportunity  to  advance  understanding  of 
gene  expression  in  individual  CTCs  and  test  whether  CTC  genomic 
information  can  be  used  in  clinical  decision  making. 

Supporting  Information 

Table  SI  FPKJV1  values  for  23,139  RefSeq  RNAs  ex¬ 
pressed  in  LNCaP  cells  and  prostate  CTCs.  For  each  cell, 
the  FPKM  of  RefSeq  RNAs  were  reported. 
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